Classifying Cancers 



Reference to Material Presented in Appendix 

This patent application includes material comprising tables and data presented as 
Appendix A on CD-ROM. The one file on the accompanying CD-ROM is entitled 
AppendixA.xls (2,868 kb), which is a Microsoft Excel Worksheet. The CD-ROM was 
created on August 2, 2001 . The format is IBM-PC. The operating system is MS-Windows 
98. The file on the CD-ROM is incorporated herein by reference. 



Background of the Invention 

Cancer is the second leading cause of death in the United States after cardiovascular 
disease (Boring et al. Cancer J. Clin. 43:7, 1 993; incorporated herein by reference). One in 
three Americans will develop cancer in his or her lifetime, and one of every four Americans 
will die of cancer. In order to better combat this deadly disease, efforts have recently focused 
on fine tuning the categorization of tumors; by categorizing cancers, physicians hope to better 
treat an individual's cancer by providing more effective treatments. Researchers and 
physicians have categorized cancers based on invasion, metastasis, gross pathology, 
microscopic pathology, imunohistochemical markers, and molecular markers. With the 
recent advances in gene chip technology, researchers are increasingly focusing on the 
categorization of tumors based on the expression of marker genes. 

The most common human cancers are malignant neoplasms of the skin (Hall et al. J. 
Am. Acad. Dermatol. 40:35-42, 1999; Weyers et al. Cancer 86:288-299, 1999; each of which 
is incorporated herein by reference). The incidence of cutaneous melanoma is rising 
especially steeply, with minimal progress in non-surgical treatment of advanced disease 
(Byers et al. Hematol. Oncol. Clin. North Am. 12:717-735, 1998; McMasters et al. Ann. Surg. 
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Oncol 6:467-475, 1999; each of which is incorporated herein by reference). Despite 
significant effort to identify independent predictors of melanoma outcome, no accepted 
histopathological, molecular, or immunohistochemical marker defines subsets of this 
neoplasm (Weyers et al. Cancer 86:288-299, 1999; Byers et al Hematol. Oncol. Clin. North 
Am. 12:717-735, 1998; each of which is incorporated herein by reference). Accordingly, 
though melanoma is thought to present with different "taxonomic" forms, these are 
considered part of a continuous spectrum rather than discrete entities (Weyers et al. Cancer 
86:288-299, 1999; incorporated herein by reference). Improved characterization and 
understanding of this potentially deadly disease would be valuable. 

Summary of the Invention 

The present invention provides a system for diagnosing aggressive forms of malignant 
melanoma based on the expression of certain marker genes within a tumor sample. In one 
embodiment, expression levels are determined for one or more of the following genes: 
Wnt5a (Seq. ID No.: 1, 2, & 3), MART-1 (Seq. ID No.: 4 & 5), pirin (Seq. ID No.: 6 & 7), 
HADHB (Seq. ID No.: 8 & 9), CD63 (Seq. ID No.: 10 & 1 1), EDNRB (Seq. ID No.: 12 & 
13), PGAM1 (Seq. ID No.: 14 & 15), HXB (Seq. ID No.: 16 & 17), RXRA (Seq. ID No.: 
18 & 19), integrin lb (Seq. ID No.: 20 & 21), syndecan 4 (Seq. ID No.: 22 & 23), 
tropomyosin 1 (Seq. ID No.: 24 & 25), AXL (Seq. ID No.: 26 & 27), EphA2 (Seq. ID No.: 
28 & 29), GAP43 (Seq. ID. No.: 30 & 31), PFKL (Seq. ID No.: 32 & 33), synuclein a (Seq. 
ID No.: 34 & 35), annexin A2 (Seq. ID No.: 36 & 37), CD20 (Seq. ID No.: 38 & 39), and 
RAB2 (Seq. ID No.: 40 & 41). In certain preferred embodiments, expression of a plurality 
of these genes is detected. In particularly preferred embodiments, WntSa is one of the genes 
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whose expression is detected. According to the present invention, overexpression of Wnt5a 
in a tumor sample indicates a more aggressive form of the disease. 

The present invention also provides a system for selecting a treatment protocol for a 
patient diagnosed with malignant melanoma based on the expression pattern of certain 
5 marker genes in a tumor sample. For example, tumors overexpressing Wnt5a may be treated 
more aggressively or with specific agents such as inhibitors of Wnt5a expression. Inhibitors 
of Wnt5a activity include anti-sense agents, RNA inhibition agents, small molecule inhibitors 
'I of Wnt5a activity, gene therapy, etc. 

In another aspect, the present invention provides a system for identifying and then 
10 treating aggressive forms of malignant melanoma by administering inhibitors of Wnt5a 
activity to a subject. 

In another aspect, the present invention provides a system for identifying compounds 
useful in the treatment of cancer, particularly aggressive forms of malignant melanoma 
expressing Wnt5a. In the inventive method, a cell expressing Wnt5a is contacted with an 

15 agent being screened for activities useful in the treatment of cancer, such as decreasing or 
inhibiting Wnt5a expression and/or activity. The agent may be a polynucleotide, protein, 
peptide, natural product, small molecule, etc. The level of Wnt5a expression or activity may 
be assayed using any available technique, including but not limited to, Northern blot analysis, 
enzyme activity, expression of a reporter gene, etc. 

20 The present invention also provides kits useful in diagnosing or identifying cancers or 

more aggressive forms of cancer. The kits may be used to identify more aggressive forms of 
malignant melanoma. The kit may include a gene chip with nucleic acid sequences of genes 
of interest including Wnt5a, MART- 1 , pirin, HADHB, CD63, EDNRB, PGAM1, HXB, 
RXRA, integrin lb, syndecan 4, tropomyosin 1, AXL, EphA2, GAP43, PFKL, synuclein a, 



3 of 59 



annexin A2, CD20, and RAB2, or a subset thereof. The kit may also or alternatively include 
primers, enzymes, and reagents for identifying, amplifying, labeling, or sequencing nucleic 
acids. Same kits may also include reagents for purifying nucleic acids such as mRNA. 
Rather than detecting gene expression, the kit may be used to determine protein levels and 
5 therefore include antibodies directed against the proteins encoded by the genes, Wnt5a, 

MART-1, pirin, HADHB, CD63, EDNRB, PGAM1, HXB, RXRA, integrin lb, syndecan 4, 
tropomyosin 1, AXL, EphA2, GAP43, PFKL, synuclein a, annexin A2, CD20, and RAB2, or 
% a subset thereof. 

U n Definitions 

■;==; "Animal": The term animal, as used herein, refers to humans as well as non-human 

; lf- animals, including, for example, mammals, birds, reptiles, amphibians, and fish. Preferred 
j^I non-human animals are a mammals (e.g. , a rodent, a mouse, a rat, a rabbit, a monkey, a dog, 
15 a cat, a primate, or a pig). An animal may be a transgenic animal. In certain embodiments, 
non-human animals may be laboratory animals, raised by humans in a controlled 
environment other than their natural habitat. 

"Antibody": The term antibody refers to an immunoglobulin, whether natural or 
wholly or partially synthetically produced. All derivatives thereof which maintain specific 
20 binding ability are also included in the term. The term also covers any protein having a 

binding domain which is homologous or largely homologous to an immunoglobulin binding 
domain. These proteins may be derived from natural sources, or partly or wholly 
synthetically produced. An antibody may be monoclonal or polyclonal. The antibody may 
be a member of any immunoglobulin class, including any of the human classes: IgG, IgM, 
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IgA, IgD, and IgE. The antibody may be a fragment of an antibody such as an Fab fragment 
or a recombinantly produced scFv fragment. 

"Cancer": Cancer refers to a malignant tumor (e.g., lung cancer) or growth of cells 
(e.g., leukemia). Cancers tend to be less differentiated than benign tumors, grow more 
rapidly, show infiltration, invasion and destruction, and may metastasize. Cancers include, 
but are not limited to, fibrosarcoma, myxosarcoma, angiosarcoma, leukemia, squamous cell 
carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, hepatocellular 
carcinoma, etc. 

"Effective amount": In general, the "effective amount" of an active agent refers to the 
amount necessary to elicit a desired biological response. As will be appreciated by those of 
ordinary skill in this art, the absolute amount of a Wnt5a inhibitor that is effective may vary 
depending on such factors as the desired biological endpoint, the agent to be delivered, the 
target tissue, etc. Those of ordinary skill in the art will further understand that an "effective 
amount" may be administered in a single dose, or may be achieved by administration of 
multiple doses. For example, in the case of anti-neoplastic agents, the effective amount may 
be the amount of agent needed to reduce the size of the primary tumor, to reduce the size of a 
secondary tumor, to reduce the number of metastases, to reduce the growth rate of a tumor, to 
reduce the ability of the primary tumor to metastasize, to increase life expectancy, etc. . 

"Marker gene": A "marker gene" may be any gene or gene product (e.g., protein, 
peptide, mRNA) that indicates a particular diseased or physiological state (e.g., carcinoma, 
normal, dysplasia) or indicates a particular cell type, tissue type, or origin. The expression or 
lack of expression of a marker gene may indicate a particular physiological or diseased state 
of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be 
determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene 
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chip analysis, etc. In certain embodiments, the level of expression of a marker gene is 
quantifiable. 

"Peptide" or "protein": According to the present invention, a "peptide" or "protein" 
comprises a string of at least three amino acids linked together by peptide bonds. The terms 
5 "protein" and "peptide" may be used interchangeably. Peptide may refer to an individual 
peptide or a collection of peptides. Inventive peptides preferably contain only natural amino 
acids, although non-natural amino acids (/. e. , compounds that do not occur in nature but that 
can be incorporated into a polypeptide chain; see, for example, 

http://www.cco.caltech.eduy~dadgrp/Unnatstruct.gif, which displays structures of non-natural 
-10 amino acids that have been successfully incorporated into functional ion channels) and/or 
amino acid analogs as are known in the art may alternatively be employed. Also, one or 
more of the amino acids in an inventive peptide may be modified, for example, by the 
addition of a chemical entity such as a carbohydrate group, a phosphate group, a farnesyl 
group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or 

1 5 other modification, etc. In a preferred embodiment, the modifications of the peptide lead to a 
more stable peptide {e.g., greater half-life in vivo). These modifications may include 
cyclization of the peptide, the incorporation of D-amino acids, etc. None of the modifications 
should substantially interfere with the desired biological activity of the peptide. 

"Polynucleotide" or "oligonucleotide": Polynucleotide or oligonucleotide refers to a 

20 polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. 
The polymer may include natural nucleosides {i.e., adenosine, thymidine, guanosine, 
cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), 
nucleoside analogs {e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3- 
methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, 
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C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 
8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine), chemically 
modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, 
modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose), or 
5 modified phosphate groups (e.g. , phosphorothioates and 5 '-N-phosphoramidite linkages). 

"Small molecule": As used herein, the term "small molecule" refers to organic 
compounds, whether naturally-occurring or artificially created (e.g., via. chemical synthesis) 
2 that have relatively low molecular weight and that are not proteins, polypeptides, or nucleic 
iV acids. Typically, small molecules have a molecular weight of less than about 1 500 g/mol. 
#10 Also, small molecules typically have multiple carbon-carbon bonds. 

fei 1 "Tumor": As used in the present application, "tumor" refers to an abnormal growth 

. of cells. The growth of the cells of a tumor typically exceed the growth of normal tissue and 

jl! tends to be uncoordinated. The tumor may be benign (e.g., lipoma, fibroma, myxoma, 
H ; lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or 
1 5 malignant (e.g. , malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, 

adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, 

colon cancer, lung cancer, etc.). 



20 Brief Description of the Drawing 

Figure 1 shows the clustering of gene expression data. a. Hierarchical clustering 
dendrogram with the cluster of 19 melanomas at the center, b. MDS three-dimensional plot 
of all 31 cutaneous melanoma samples showing major clustering of 19 samples (blue, within 
cylinder), and remaining 12 samples (gold), c. A plot of the observed and expected number 
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of genes producing a given number of classification errors for a partition of the 3 1 
melanomas into two groups of 12 and 19. Red triangles, observed clusters; filled bars, 
randomly produced clusters, open circles, predicted results for randomly variable gene 
expression, d. Introduction of random gaussian noise followed by cuts from the top of the 
5 original tree (resulting in k clusters), to determine discrepant pairs after perturbation (see 
Supplementary Information in Examples). 

Figure 2 illustrates the identification of genes which discriminate melanoma clusters. 
C 1 a. MDS analysis ranking genes according to their impact on minimizing cluster volume and 
W maximizing center-to center inter-cluster distance, b. Top 22 genes obtained by these 
jlJO criteria listed in order of decreasing weight (for a full list, see Supplementary Information in 

Examples). Right, data from cutaneous melanomas identified on the horizontal axis and 
fii sorted by cluster (described in Maniotis et al. "Vascular channel formation by human 
I'U melanoma cells in vivo and in vitro: vasculogenic mimicry'Mm. J. Pathol. 155:739-752, 

1999; incorporated herein by reference). Left, data from uveal melanomas expressed as the 
15 ratio of highly invasive to less invasive. Red, high ratios; green, low ratios (intensity of 
saturation scaled according to the ratio). The three genes not scored in the uveal samples 
were not included in the print design of the cutaneous samples. 

Figure 3. Guiding gene cluster selection, a. Two-dimensional cluster analysis of 
cutaneous melanoma samples (horizontal axis) and genes (vertical axis, presented in 
20 segments), b-e. Data from a queried at regions corresponding to four two discriminators of 
the major cluster: MART-1 (b), CD63 (c), tropomyosin (d), and WNT5a (e). Note that these 
clusters include other genes from the discriminator list (bold). The major cluster of 19 
samples is visually apparent on the left of this display. The full list of gene names and 
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corresponding calculated ratio information is provided in the Supplementary Information in 
the Examples. 

Figure 4 shows the variation in biological properties of melanoma clusters, a-c. A 
representative member of the major melanoma cluster (UACC-1022). d-f. A sample falling 
5 outside of the major cluster (M93-047). The two groups differ in the ability to migrate into a 
scratch wound (a, d), contract collagen gels (b, e) and form tubular networks (c, f). Results 
of these and additional cell mobility/invasion assays are included in Table 1. Tubular 
network formation (vasculogenic mimicry (Maniotis et al. "Vascular channel formation by 
human melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J. Pathol 1 55:739- 

;1 0 752, 1999; incorporated herein by reference), f) and collagen gel contraction (related to the 
patterning of vascular channels, e) were observed only outside the major cluster (Table 1). 

Z Figure 5 shows a Kaplan-Meier survival plot for a total of 15 cases, 10 from Group A 

~ and 5 from Group B. No statistically significant association between group and survival was 

= found (p = 0.135). 

1 5 Figure 6 shows the data obtained from the top 22 genes with Wnt5a at the top of the 

list. The figure also show a diagram of the Wnt5a and Wntl signaling pathways. 

Figure 7 shows the data from real time PCR analysis of three cell lines, one with low 
Wnt5a expression (which scored as having low expression in the gene chip analysis), one 
with high Wnt5a expression (which scored as having high expression in the gene chip 

20 analysis), and one with intermediate Wnt5a expression, an originally low scoring cell line 
which had been transfected with a vector designed to express Wnt5a. The patent and 
transfected cell line were also analyzed for WNT5A protein abundance using Western blot 
analysis and immunohistochemical staining. 
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Figure 8 shows the dramatic changes in cell morphology and cytoskeletal 
organization upon transfection of the parental cell line with a vector driving Wnt5a 
expression. The parental cell line is spindle shaped with few points of attachment to the 
culture plate and disorganized actin filaments. The transfectants are broader and flatter with 
5 many extensions and highly polarlized actin filaments. 

Figure 9 shows the results of experiments done to look at possible cross talk between 
the Wnt5a and Wntl pathways. Beta-catenin was localized to the cytoplasm indicating that 
the Wntl pathway is not active. The downstream target of Wnt5a, protein kinase C, was also 
observed to be phosphorylated, especially the mu and alpha/beta isoforms, indicating that the 
10 expected Wnt5a pathway is active. 

Figure 10 shows scratch assay and Boyden chamber assay results for the parent cell 
line as well as the transfected cell line. The results from these two standard assays show that 
increased cell movement and invasiveness correlate with increased Wnt5a expression. 

Figure 11 shows that the transition from low to high Wnt5a expression is not 
1 5 associated with increasing amounts of the G protein coupled receptor, frizzled 5 (fzd5). Also 
shown are results indicating that an antibody to fzd5 can attenuate or reverse the phenotype 
that increased Wnt5a would normally produce. 



20 Detailed Description of Certain Preferred Embodiments of the Invention 

The present invention provides systems for identifying and treating cancers based on 
the expression of marker genes in the cancer cells. In a particular embodiment, the cancer to 
be categorized is malignant melanoma. The invention allows for the identification of more 
aggressive forms of cancer and profiling the affected patient so that a proper treatment 
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regimen can be initiated. The present invention also provides for kits useful in practicing the 
inventive methods. 

Diagnosing and Identifying Forms of Cancer 
5 In diagnosing or identifying a particular cancer or tumor, a test sample containing at 

least one cell from the tumor is provided to obtain a genetic sample. The test sample may be 
obtained using any technique known in the art including biopsy, blood sample, sample of 
% bodily fluid (e.g., urine, lymph, ascites, cerebral spinal fluid, pleural effusion, sputum, stool, 
fy tears, sweat, pus, etc.), surgical excisions, needle biopsy, scraping, etc. From the test sample 
-JO is obtained a genetic sample. The genetic sample comprises a nucleic acid, preferably RNA 
; and/or DNA. For example, in determining the expression of marker genes one can obtain 
mRNA from the test sample, and the mRNA may be reverse transcribed into cDNA for 
further analysis. In another embodiment, the mRNA itself is used in determining the 
k- expression of marker genes. In some embodiments, the expressions level of a particular 
1 5 marker gene may be determined by determining the level/presence of a gene product (e.g. , 
protein) thereby eliminating the need to obtain a genetic sample from the test sample. 

The test sample is preferably a sample representative of the tumor or cancer as a 
whole. Preferably there is enough of the test sample to obtain a large enough genetic sample 
to accurately and reliably determine the expression levels of marker genes of interest in the 
20 cancer or tumor. In certain embodiments, multiple samples may be taken from the same 
tumor in order to obtain a representative sampling of the tumor. 

A genetic sample may be obtained from the test sample using any techniques known 
in the art (Ausubel et al. Current Protocols in Molecular Biology (John Wiley & Sons, Inc., 
New York, 1999); Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, 



11 of 59 



Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1 989); Nucleic Acid 
Hybridization (B. D. Hames & S. J. Higgins eds. 1984); the treatise, Methods in Enzymology 
(Academic Press, Inc., N.Y.); each of which is incorporated herein by reference). The 
nucleic acid may be purified from whole cells using DNA or RNA purification techniques. 
5 The genetic sample may also be amplified using PCR or in vivo techniques requiring 

subcloning. In a preferred embodiment, the genetic sample is obtained by isolating mRNA 
from the cells of the test sample and reverse transcribing the RNA into DNA in order to 

J create cDNA (Khan et al. Biochem. Biophys. Acta 1423: 17-28, 1999; incorporated herein by 

fy reference). 

4<) Once a genetic sample has been obtained, it can be analyzed for the presence or 

^ 1 absence of particular marker genes. The analysis may be performed using any techniques 
'It known in the art including, but not limited to, sequencing, PCR, RT-PCR, quantitative PCR, 
Itj restriction fragment length polymorphism, hybridization techniques, Northern blot, 

microarray technology, DNA microarray technology, etc. In determining the expression level 
15 of a marker gene or genes in a genetic sample, the level of expression may be normalized by 

comparison to the expression of another gene such as a well known, well characterized gene 

or a housekeeping gene. 

The expression data from a particular marker gene or group of marker genes may be 

analyzed using statistical methods described below in the Examples in order to determine the 
20 phenotype or characteristic of a particular tumor or cancer. Methods used in classifying 

tumors based on gene expression data are described in Ben-Dor et al. J. Comput. Biol. 7(3 & 

4):559-584, 2000; incorporated herein by reference. The analyzed data may also be used to 

select/profile patients for a particular treatment protocol. 
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For example, the present invention demonstrates that marker gene WntSa is expressed 
at high levels in more aggressive forms of malignant melanomas. A patient with malignant 
melanoma may have the expression level of WntSa in the cells of his/her tumor determined in 
order to help determine the prognosis and/or treatment plan for his/her particular disease. 
5 The expression level of WntSa would preferably be one of several factors used in deciding 
the prognosis or treatment plan of a patient. Preferably a trained and fully licensed physician 
would be consulted in determining the patient's prognosis and treatment plan. A high level 
of expression of Wnt5a may indicate a worse prognosis and suggest a more aggressive 
treatment plan. The treatment plan may also include inhibitors of WntSa activity such as anti- 

-10 sense agents and gene therapy directed against WntSa. Small molecule inhibitors of Wnt5a 
activity may also be used in the treatment plan as well as pharmaceuticals that inhibit the 

: Wnt5a pathway either upstream or downstream of WntSa itself. 



Marker Genes 

1 5 The present invention provides several marker genes that correlate with particularly 

aggressive forms of malignant melanoma. These markers may also be useful in categorizing 
other tumors or cancers other than malignant melanoma. For example, inventive marker 
genes may be useful in categorizing other types of skin cancer. Preferred marker genes 
include Wnt5a, MART-1, pirin, HADHB, CD63, ENDRB, PGAM1, HXB, RXRA, integrin 

20 bl, syndecan 4, tropomyosin 1, AXL, EphA2, GAP43, PFKL, synuclein a, annexin A2, 

CD20, and RAB2, and combinations thereof. Other potential marker genes are listed in the 
Examples below. Particular sets of marker genes may be defined using statistical methods as 
described in the Examples in order to decrease or increase the specificity or sensitivity of the 
set. For example, a particular set of marker genes highly specific of aggressive forms of 
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malignant melanoma may be less sensitive (i.e. , a negative result may occur in the presence 
on an aggressive form of melanoma). 

Different subsets of marker genes may be developed that show optimal function with 
different races, ethnic groups, sexes, geographic groups, stages of disease, types of cancer, 
5 cell types, etc. Subsets of marker genes may also be developed to be sensitive to the effect of 
a particular therapeutic regimen on disease progression. 

One particularly useful marker gene in the diagnosis of aggressive form of malignant 
melanoma is Wnt5a. The Wnt genes make up a large family of highly conserved genes that 
have been studied extensively in development. The first member, int-1 was discovered as a 
10 common integration site of mouse mammary tumor virus (MMTV) in mammary epithelial 

adenocarcinomas (Nusse and Varmus Cell 69:1073-1087, 1992; incorporated herein by 
" reference). Int-1 is highly homologous to the Drosophila developmental gene wingless that 
: ' is involved in pattern formation. The combination of wingless and int-1 gives rise to the term 
Wnt. Homologues of Wnt genes have been isolated in Drosophila, Xenopus, chicken, mouse, 
1 5 and humans (Nusse and Varmus Cell 69 : 1 073- 1 087, 1 992; incorporated herein by reference). 
In humans, there are nine Wnt genes known including Wnt 5 a (Clark et al. Genomics 18:249- 
260, 1993; Lejeune et al. Clin. Cancer Res. 1:215-222, 1995; each of which is incorporated 
herein by reference). Wnt 5 a has been found to be up-regulated in lung, colon, and prostate 
carcinomas and melanomas (Iozzo et al. Cancer Res. 55:3495-3499, 1995; incorporated 
20 herein by reference). 

The sequence of the mRNA of Homo sapiens wingless MMTV integration site 
family, member 5a (Wnt 5 a) is shown below: 

1 attaattctg gctccacttg ttgctcggcc caggttgggg agaggacgga 
gggtggccgc 

25 61 agcgggttcc tgagtgaatt acccaggagg gactgagcac agcaccaact 

agagaggggt 
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121 cagggggtgc gggactcgag 
gggctttgac 

181 tcaacagaat tgagacacgt 
atcccagcga 
5 241 aaatcagatt tcctggtgag 

aactgcctat 

3 01 atcttgccat caaaaaactc 
aacttaagag 

3 61 acccccgatg ctcccctggt 
10 gggaataaac 

421 atcttttcct tcttccctct 
agttgctttg 

481 gggatggctg gaagtgcaat 
catatttttc 
15 541 tccttcgccc aggttgtaat 

gaataaccct 

6 01 gttcagatgt cagaagtata 
'% actggcagga 

If. 661 ctttctcaag gacagaagaa 

;5'0 gtacatcgga 

'^1 721 gaaggcgcga agacaggcat 

'!fl; acggtggaac 

y 781 tgcagcactg tggataacac 

cagccgcgag 
?25 841 acggccttca catacgccgt 

ccgggcgtgc 

901 cgcgagggcg agctgtccac 

ggacctgccg 
fU 961 cgggactggc tctggggcgg 

C30 ctttgccaag 

M' 1021 gagttcgtgg acgcccgcga 

cgagagtgct 

1081 cgcatcctca tgaacctgca 
caacctggct 
35 1141 gatgtggcct gcaagtgcca 

atgctggctg 

12 01 cagctggcag acttccgcaa 
cagcgcggcg 

1261 gccatgcggc tcaacagccg 
40 caactcgccc 

1321 accacacaag acctggtcta 
caatgagagc 

13 81 accggctcgc tgggcacgca 
catggatggc 

45 1441 tgcgagctca tgtgctgcgg 

gacggagcgc 

1501 tgccactgca agttccactg 
ggagatcgtg 

1561 gaccagtttg tgtgcaagta 
50 ccaggacccg 

1621 cttatttata gaaagtacag 
ttttattttt 

16 81 ccccaagaat tgcaaccgga 
ctctgtggtt 



cgagcaggaa ggaggcagcg cctggcacca 
ttgtaatcgc tggcgtgccc cgcgcacagg 
gttgcgtggg tggattaatt tggaaaaaga 
acggaggaga agcgcagtca atcaacagta 
ttaacttgta tgcttgaaaa ttatctgaga 
ccagaagtcc attggaatat taagcccagg 
gtcttccaag ttcttcctag tggctttggc 
tgaagccaat tcttggtggt cgctaggtat 
tattatagga gcacagcctc tctgcagcca 
actgtgccac ttgtatcagg accacatgca 
caaagaatgc cagtatcaat tccgacatcg 
ctctgttttt ggcagggtga tgcagatagg 
gagcgcagca ggggtggtga acgccatgag 
ctgcggctgc agccgcgccg cgcgccccaa 
ctgcggcgac aacatcgact atggctaccg 
gcgggagcgc atccacgcca agggctccta 
caacaacgag gccggccgca ggacggtgta 
tggggtgtcc ggctcatgta gcctgaagac 
ggtgggtgat gccctgaagg agaagtacga 
gggcaagttg gtacaggtca acagccgctt 
catcgacccc agccctgact actgcgtgcg 
gggccgcctg tgcaacaaga cgtcggaggg 
ccgtgggtac gaccagttca agaccgtgca 
gtgctgctac gtcaagtgca agaagtgcac 
gtgggtgcca cccagcactc agccccgctc 
tgattctggt ttttggtttt tagaaatatt 
accatttttt ttcctgttac catctaagaa 
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1741 tattattaat attataatta ttatttggca ataatggggg tgggaaccac 
gaaaaatatt 

18 01 tattttgtgg atctttgaaa aggtaataca agacttcttt tggatagtat 
agaatgaagg 

5 1861 gggaaataac acatacccta acttagctgt gtgggacatg gtacacatcc 

agaaggtaaa 

1921 gaaatacatt ttctttttct caaatatgcc atcatatggg atgggtaggt 
tccagttgaa 

1981 agagggtggt agaaatctat tcacaattca gcttctatga ccaaaatgag 
10 ttgtaaattc 

2 041 tctggtgcaa gataaaaggt cttgggaaaa caaaacaaaa caaaacaaac 
ctcccttccc 

2101 cagcagggct gctagcttgc tttctgcatt ttcaaaatga taatttacaa 
tggaaggaca 

15 2161 agaatgtcat attctcaagg aaaaaaggta tatcacatgt ctcattctcc 

f- tcaaatattc 

2221 catttgcaga cagaccgtca tattctaata gctcatgaaa tttgggcagc 
agggaggaaa 

2281 gtccccagaa attaaaaaat ttaaaactct tatgtcaaga tgttgatttg 
20 aagctgttat 

2341 aagaattggg attccagatt tgtaaaaaga cccccaatga ttctggacac 
tagatttttt 

Iff 24 01 gtttggggag gttggcttga acataaatga aatatcctgt attttcttag 

ggatacttgg 

':,.;> 5 2461 ttagtaaatt ataatagtag aaataataca tgaatcccat tcacaggttt 

ctcagcccaa 

2521 gcaacaaggt aattgcgtgc cattcagcac tgcaccagag cagacaacct 
atttgaggaa 

2581 aaacagtgaa atccaccttc ctcttcacac tgagccctct ctgattcctc 
lr 60 cgtgttgtga 

: H ! 2641 tgtgatgctg gccacgtttc caaacggcag ctccactggg tcccctttgg 

ttgtaggaca 

2701 ggaaatgaaa cattaggagc tctgcttgga aaacagttca ctacttaggg 
atttttgttt 

35 2761 cctaaaactt ttattttgag gagcagtagt tttctatgtt ttaatgacag 

aacttggcta 

2821 atggaattca cagaggtgtt gcagcgtatc actgttatga tcctgtgttt 
agattatcca 

2881 ctcatgcttc tcctattgta ctgcaggtgt accttaaaac tgttcccagt 
40 gtacttgaac 

2 941 agttgcattt ataagggggg aaatgtggtt taatggtgcc tgatatctca 
aagtcttttg 

3 001 tacataacat atatatatat atacatatat ataaatataa atataaatat 
atctcattgc 

45 3061 agccagtgat ttagatttac agcttactct ggggttatct ctctgtctag 

agcattgttg 

3121 tccttcactg cagtccagtt gggattattc caaaagtttt ttgagtcttg 
agcttgggct 

3181 gtggccccgc tgtgatcata ccctgagcac gacgaagcaa cctcgtttct 
50 gaggaagaag 

3241 cttgagttct gactcactga aatgcgtgtt gggttgaaga tatctttttt 
tcttttctgc 

3 3 01 ctcacccctt tgtctccaac ctccatttct gttcactttg tggagagggc 
attacttgtt 
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3361 cgttatagac atggacgtta agagatattc aaaactcaga agcatcagca 
atgtttctct 

3421 tttcttagtt cattctgcag aatggaaacc catgcctatt agaaatgaca 
gtacttatta 

5 3481 attgagtccc taaggaatat tcagcccact acatagatag cttttttttt 

tttttttttt 

3541 ttttaataag gacacctctt tccaaacagg ccatcaaata tgttcttatc 
tcagacttac 

3601 gttgttttaa aagtttggaa agatacacat cttttcatac ccccccttag 
10 gaggttgggc 

3661 tttcatatca cctcagccaa ctgtggctct taatttattg cataatgata 
tccacatcag 

3721 ccaactgtgg ctctttaatt tattgcataa tgatattcac atcccctcag 
ttgcagtgaa 

15 3781 ttgtgagcaa aagatcttga aagcaaaaag cactaattag tttaaaatgt 

cacttttttg 

3841 gtttttatta tacaaaaacc atgaagtact ttttttattt gctaaatcag 
attgttcctt 

t;' 3901 tttagtgact catgtttatg aagagagttg agtttaacaa tcctagcttt 

! 20 taaaagaaac 

H; 3 961 tatttaatgt aaaatattct acatgtcatt cagatattat gtatatcttc 

f- tagcctttat 

3 4021 tctgtacttt taatgtacat atttctgtct tgcgtgattt gtatatttca 

ctggtttaaa 

=25 4081 aaacaaacat cgaaaggctt attccaaatg gaag 



The translated sequence of Wnt5a is as follows: 

MAGSAMSSKFFLVALAIFFSFAQVVIEANSWWSLGMNNPVQMSE 

VYIIGAQPLCSQLAGLSQGQKKLCHLYQDHMQYIGEGAKTGIKECQYQFRHRRWNCST 
30 VDNTSVFGRVMQIGSRETAFTYAVSAAGVVNAMSRACREGELSTCGCSRAARPKDLPR 
DWLWGGCGDNIDYGYRFAKEFVDARERERIHAKGSYESARJLMNLHNNEAGRRTVYNL 
ADVACKCHGVSGSCSLKTCWLQLADFRKVGDALKEKYDSAAAMRLNSRGKLVQVNSRF 
NSPTTQDLVYIDPSPDYCVRNESTGSLGTQGRLCNKTSEGMDGCELMCCGRGYDQFKT 
VQTERCHCKf HWCCYVKCKKCTEIVDQFVCK (Seq. ID No.: 3) 

35 

Other sequences homologous to the above sequences may also be used in the present 
invention. Preferably the sequence is at least 70% identical to the human Wnt5a DNA and 
protein sequences listed above. More preferably the sequence is at least 80%, 90%, 95%, 
97%, 98%o, 99%, or >99% identical. A homolog of Wnt5a may also be identified by its 
40 activity. In another preferred embodiment, the homolog of WntSa is identified by its location 
in the genome {e.g., location on the chromosome). 
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The present invention also provides a novel method of identifying compounds useful 
in the treatment of patients with cancer. In certain embodiments, the cancer is malignant 
melanoma. In other embodiments, the cancer is a malignant melanoma expressing WntSa. In 
particular, the inventive method identifies compounds directed against Wnt5a or Wnt5a 
activity specifically, or more generally, against downstream or upstream signals in the WntSa 
pathway. 

Any compound, moiety, or entity can be screened for activity against WntSa 
according to the present invention. For example, polynucleotides, peptides, proteins, natural 
products, chemical compounds, small molecules, polymers, biomolecules, etc. may be tested. 
The agents to be screened may be prepared by purification or synthesis, or may be obtained 
from commercial or other stock sources. 

The assay used to screen the agents may be an in vitro or in vivo assay. For example, 
an in vitro assay may utilize purified or partially purified WNTSA protein. The WNT5 A 
protein may be obtained by purifying the protein from a natural source or from a cell, such as 
bacteria, mammalian cells, yeast, or fungi, overexpressing WNTSA. Methods for 
overexpressing and purifying the proteins encoded by cloned genes are well known in the art 
(see, Ausubel et al. Current Protocols in Molecular Biology (John Wiley & Sons, Inc., New 
York, 1999); Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, 
and Maniatis (Cold Spring Harbor Laboratory Press: 1989; each of which is incorporated 
herein by reference). Agents may be screened for their ability to bind the WNT5A protein or 
to enhance or prevent an interaction between WNT5A and another protein, peptide, 
polynucleotide, or chemical compound. Agents may also be screened for their ability to 
affect more downstream effects of WNT5A. Agents may be screened using high-throughput 
techniques known in the arts. 
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In one embodiment of an in vivo assay, a cell expressing WntSa is contacted with an 
agent to be tested. The level of WntSa expression or activity is then determined using an 
assay known in the art. These assays may include but are not limited to Northern blot 
analysis, enzyme activity, quantitative PCR, Western blot analysis, etc. As would be 
5 appreciated by one of skill in this art, experiments designed to screen for agents directed 
against WntSa may include proper positive and/or negative controls. The experiment may 
also include testing a particular agent a several difference concentrations in the range of about 
Z 1 nM to about 100 mM, preferably about 1 nM to about ImM, more preferably about 1 nM to 
about 100 uM. 

:10 In one preferred embodiment, the cells used in the screening method are skin cells, 

more preferably malignant melanoma cells. In certain embodiments, the cells or cell line are 

; genetically engineered to express WntSa. In certain embodiments, the cells are malignant 

melanoma cells that did not express Wnt5a naturally but have been genetically engineered to 
express WntSa. Preferred embodiments of such cells and cell lines are described below in the 

15 Examples. 

Inventive methods of detecting whether a compound inhibits Wnt5a may include an 
assay which assesses the ability of the cells to "chew through", digest, or migrate through 
extracellular matrix as described below in the Examples. Assays of this type may include, 
but are not limited to, the scratch assay, and the Boyden chamber assay. A cell that 
20 overexpresses Wnt5a may be able to digest or migrate through extracellular matrix in its 

search for media or nutrients. Agents that inhibit such a cell's ability to digest extracellular 
matrix and/or may be inhibiting the activity of WntSa may be useful in the treatment 
malignant melanoma expressing WntSa. In a preferred embodiment, the agent reduces the 
ability of the cell to digest or migrate through extracellular by at least about 50% when 
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compared to cell that were not contacted with the agent, more preferably by at least about 

75%, and most preferably by at least about 90%. 

In certain other embodiments, cell morphology or cytoskeletal organization may be 

used to assess the effect of an agent on cells expressing WntSa. The cells may be contacted 
5 with various concentrations of the agent with a control plate of cells contacted with no agent. 

The shape of the cells, number of attachments of each cell to the plate, and/or the 

organization of actin filaments may be assessed to determine the effect of the agent on the 
% cells. In other embodiments, downstream signaling molecules in the Wnt5a pathway are 
K analyzed to determine the effect of the added agent. In one embodiment, the phosphorylation 
ijO of protein kinase C is used to determine the effect of the agent. 

& i n 0 ther embodiments, agents may be screened for their ability to inhibit or knock out 

* the Wnt5a pathway as shown in Figure 6. In one embodiment, agents may be screened for 
Z their ability to block the binding of WNT5A to its receptor, frizzled 5. An agent able to block 
-* this binding interaction could possibly attenuate or reverse the phenotypes that increased 
1 5 WNT5A would normally produce, such as increased cell movement an invasiveness. 

These and other aspects of the present invention will be further appreciated upon 
consideration of the following Examples, which are intended to illustrate certain particular 
20 embodiments of the invention but are not intended to limit its scope, as defined by the claims. 

Examples 
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Example 1-Molecular Classification of Cutaneous Malignant Melanoma by Gene 
Expression Profiling 

We have proposed that a discrete and previously unrecognizable cancer taxonomy can 
be identified by viewing the systematized data from gene expression experiments (Bittner et 
5 al Nature 406:536-540, 3 August 2000; incorporated herein by reference). However, for 
melanoma, inherent or technically induced variation could obscure such a classification as its 
appearance is very similar between patient samples and, in contrast to haematologic cancers 

~ (Golub et al. "Molecular classification of cancer, class discovery and class prediction by gene 
expression monitoring" Science 286:531-537, 1999; Alizadeh et al "Distinct types of diffuse 

-10 large B-cell lymphoma identified by gene expression profiling" Nature 403:503-5 1 1 , 2000; 
each of which is incorporated herein by reference), it has few known recurring genetic 
changes. To explore this question, we gathered expression profiles for 38 samples, including 
31 melanomas and 7 controls (Table 1). Total messenger RNA was isolated directly from 

-= melanoma biopsies or tumor cell cultures, prepared fluorescent complementary DNA from 
1 5 the message and hybridized them to a microarray containing probes for 8,1 50 cDNAs 

(representing 6,971 unique genes), obtaining quantitative and comparative measurements for 
each gene. 

The tumor cell mRNA was compared with a single reference probe, providing 
normalized measures of the expression of each gene in each sample relative to the standard. 
20 Analysis of the normalized expression across all genes between samples provided a measure 
of the overall difference in expression pattern between samples. Similarly, the orthogonal 
analysis of linear covariance between pairs of genes across all samples provided a measure of 
the similarity of behavior of the genes studied. 
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Figure 1 shows the integration of several analytical methods to visualize the overall 
expression pattern relationships between cutaneous melanoma tumor samples. Using a 
matrix of Pearson correlation coefficients from the complete pair- wise comparison of all 
experiments (Bittner et al. "Data analysis and integration of steps and arrows" Nature Genet. 
5 22:213-215, 1999; incorporated herein by reference), the 31 melanoma experiments are 

displayed as a hierarchical clustering dendrogram (Khan et al. "Gene expression profiling of 
alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1 998; 
^ Eisen et al. "Cluster analysis and display of genome-wide expression patterns" Proc. Natl 
LI Acad. Sci. USA 95:14863-14868, 1998; each of which is incorporated herein by reference) 
QO and as a three-dimensional multidimensional scaling (MDS) plot (Khan et al. "Gene 
= expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 
'-■ 58:5009-5013, 1998; Everitt, B. Applied Multivariant Data Analysis. (Oxford Univ. Press, 
1=2! New York, 1 992); incorporated herein by reference). The MDS plot displays the position of 
each tumor sample in three-dimensional Euclidean space, with the distance between 
1 5 experimental samples reflecting their approximate degree of correlation (Khan et al. "Gene 
expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 
58:5009-501 3, 1998; Everitt, B. Applied Multivariant Data Analysis. (Oxford Univ. Press, 
New York, 1992); incorporated herein by reference). The analysis included all genes 
meeting a minimum level of expression in each hybridization. We also employed a non- 
20 hierarchical clustering algorithm (termed cluster affinity search technique; CAST) (Ben-Dor 
et al. "Clustering gene expression patterns" J. Comput. Biol. 6:281-297, 1999 incorporated 
herein by reference) to define experimental clusters. The resulting hierarchical dendrogram 
of the 3 1 melanoma samples (Fig. la) demonstrates that 1 9 samples are tightly clustered at the 
bottom of the dendrogram in the area of highest similarity. Likewise, the non-hierarchical 
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CAST algorithm identified the identical major cluster 19 melanomas. This cluster is also a 
compact, readily separable grouping based on its overall similarity of expression pattern 
viewed by MDS (Fig. lb). 

There is no single established method to estimate the significance of an observed 
5 degree of relationship obtained by cluster prediction techniques (Golub et al. "Molecular 
classification of cancer, class discovery and class prediction by gene expression monitoring" 
Science 286:531-537, 1999; Bittner et al. "Data analysis and integration of steps and arrows" 
2 Nature Genet. 22:213-215, 1999; each of which is incorporated herein by reference). 
W Accordingly, we used two independent approaches to test the validity of our cluster 
§0 prediction of the 1 9-element cluster. The first approach (Fig. 1 c) examines the power of 
W individual genes to discriminate the major cluster of 1 9 from the remaining samples by 
: : examining the frequency of strong classifier genes compared to the expected frequency of 
such genes if expression is randomly variable, and to the frequency of strong classifiers in 
H : random partitions of the same samples into new groupings of 19 and 12 (Ben-Dor et al. 
15 "Class Discovery in Gene Expression Data" Proceeding RECOMB 2001, pp. 31-38, 2001; 
incorporated herein by reference). The non-randomness of the cluster results is evident. 
Specifically, many genes have expression patterns that differ strongly between the initial 
sample clusters and thus serve as good classifiers (Fig. lc, red triangles). However, 
expression patterns are not readily found which classify the samples when they are grouped 
20 into random partitions of the same size (Fig. 1 c, blue lines). Accordingly, in randomly 
formed clusters, expression behavior is essentially indistinguishable from truly random 
behavior of genes relative to these clusters (Fig. lc, compare blue lines with open circles). 

The second approach we used to test the validity of the cluster predictions is based on 
evaluating cluster membership after introducing random perturbations to the data set. For 
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each sample, the log-ratio of each gene was perturbed by the introduction of random gaussian 
noise with the mean equal to 0 and the standard deviation equal to 0.15 (an estimate of 
variation derived by computing the median standard deviation of the log-ratios for single 
genes across all 31 samples). Hierarchical clustering was then performed on the perturbed 
5 data set and a comparison made between the original tree (Fig. 1 a) and the perturbed tree. 
Comparisons involved cutting the original and perturbed trees into k clusters followed by 
computing the proportion of paired samples clustering together in the original tree that did 
Z not cluster together in the perturbed tree (we refer to this measure as a weighted proportion of 
I discrepant pairs because it gives more weight to larger clusters). The comparison was 
!l 0 repeated over multiple perturbed data sets for each possible cut in the original tree (£ = 2, 3, 

30). For a given k, the weighted proportion of discrepant pairs was then averaged over the 
': perturbed data sets resulting in the identification of weighted average discrepant pairs 
: (WADPt; see Supplementary Information). 

Clusters that result from cutting the original tree into 9 or fewer groups are very 
1 5 reproducible (Fig. Id). It is noteworthy that the rise in WADP* almost exactly coincides with 
the division of the major 19-element cluster into smaller sub-clusters. These results strongly 
support the view that the major cluster of melanoma samples identified in this study 
represents a bona fide and highly reproducible grouping. 

We then performed statistical tests to determine whether any clinical or tumour cell 
20 characteristics were specifically associated with the clustered group. Tests for associations 
between the major cluster of 19 samples and the remaining 12 melanoma samples were 
performed for several in vivo variables, including sex, age, biopsy site, Breslow thickness, 
Clark's level and survival. There was no statistically significant association between the 
cluster group and any clinical variable. There were also no significant associations with the 
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in vitro variables, including pi 6 or p-catenin mutation status, in vitro pigmentation and cell 

passage number (see Supplementary Information). 

We included two pairs of specimens derived from the same patient in this sample set. 

These are M92-001 and M93-007 (two different samples from the same individual, surgically 
5 removed one year apart), and TD-1376-3 and TC-1376-3 (the biopsy sample and a cell 

culture of the same tumour carried three passages in vitro). Although there was no 

significant association between cell passage number and cluster group (P = 0.857, see 
;? : Supplementary Information), the TD- 1 376-3/TC- 1 376-3 pair were included to serve as 
ffi another control for the effects of cell culture. Remarkably, of the 465 pairwise comparisons 
J() among the melanoma samples, the pairs TD-1376-3/TC-1 376-3 and M92-001/M93-007 are 
^ the second and third most highly correlated pairs of samples, with nearly identical correlation 
% coefficients (Fig. lb). 

r On the basis of the linear correlation of global gene expression in Fig. 1 , Figs 2 and 3 

P illustrate the approach we have used to guide 'gene cluster' interpretation empirically. Fig. 
1 5 2a depicts our statistical method for extracting a 'weighted list' of individual genes whose 
variance of change across all experiments correctly defines the boundary of a given sample 
cluster (for details see Supplementary Information). Fig. 2b displays the list of genes with 
the most power to define the major melanoma cluster of 19 samples (Fig. 1 a and b) in rank 
order along the vertical axis. The samples are ordered along the horizontal axis by cluster 
20 inclusion, and data are presented graphically as coloured images with the colour saturation 
directly proportional to the magnitude of the measured gene expression ratio (brightest red, 
highest R/G ratio; black squares, R/G ratio = 1; brightest greens, lowest R/G ratio). The 
complete list of genes discriminating the major cluster is in the Supplementary Information. 
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The weighted gene list can also be used to guide analysis of the larger gene 
expression data set. Figure 3a displays all data from the cutaneous melanoma samples in this 
study as a coloured image with genes ordered along the vertical axis by similarity of 
expression pattern (after Eisen et al. "Cluster analysis and display of genome-wide 
expression patterns" Proc. Natl. Acad. Sci. USA 95:14863-14868, 1998; incorporated herein 
by reference). However, rather than basing analysis of this large (>300,000 elements) data 
set entirely on visual selection, we used genes from the weighted list to index gene cluster 
selection. Figure 3b-e illustrates this approach using four genes from the 'weighted list' in 
Fig. 2b (MART-1, CD63, tropomyosin and WNT5A), to interrogate the entire gene 
expression data set represented in Fig. 3a. 
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Table 1 Summary of melanoma cases by cluster designation 
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Control samples 

Nil. C (fibroblast), UACC-3149 (ovarian adenocarcinoma); MCF-10A (breast epithelium), CRL-1634 (fibroblast), SRS-3 (cell culture variant), SRS- 
5 (cell culture variant),RMS-13 (rhabdomyosarcoma) 



* Mutation status of indicated samples for pl6 obtained by sequencing Deleted, homozygous. Supplementary Information includes the specific 
mutations in pi 6 for each sample tested. Samples were also sequenced for p-catemn. No example of p-catemn mutation was observed. 

t Ability to invade a defined basement matrix. P = 0.0055; t-test for two populations. 
JTube forming ability at 5 days in a three-dimensional matrigel matrix 

§ Ability to contract floating collagen 1 gels at 5 days as compared to HT-1080 fibrosarcoma cells (Maniotis et al. "Vascular channel formation by 
human melanoma cells in vivo and m vitro: vasculogemc mimicry" Am. J Pathol 15 5 739-752, 1999, incorporated herein by reference) 
HMigration rates expressed in urn per day. Mean from eight experiments ± s d (P = 0 0063; t-test for two populations) Rates below 100 urn per day 
completely segregates in the melanoma primary cluster 

H Ability to close in vitro scratch wound at 24 h. Photographs of the wound were measured and percentage wound closure determined (Silletti et al 
"Autocrine motility factor and the extracellular matrix I Coordinate regulation of melanome cell adhesion, spreading and migration involves focal 
contact reorganization" lnt J. Cancer 76:120-128, 1998; incorporated herein by reference) (P < 0.00002, t-test for two populations). 

# M91 -054 was the only sample that demonstrated a mixed phenotype m culture with both an epitheloid population and a more fibroblastic 
population Vasculogenic mimicry and gel contraction were only observed in the epitheloid population Scratch assay resulted u 
24 h for both populations. 

O TC-1376 mRNA was isolated after short term (3 passage) culture of the biopsy sample from the patient TD-1376 allowing the effects of short te 

culture on the expression profile to be observed 

** UACC-647 cells form extensive cord-like networks by 5 days 



Vo closure after 



Finally, in parallel to our microarray analysis of cutaneous melanoma, we studied a 
series of uveal melanoma specimens characterized for properties related to metastasis, 
including invasive ability and vasculogenic mimicry in vitro (Maniotis et al. "Vascular 
5 channel formation by human melanoma cells in vivo and in vitro: vasculogenic mimicry" 
Am. J. Pathol. 155:739-752, 1999; incorporated herein by reference). These samples were 
hybridized pairwise, directly comparing highly invasive cells to their less invasive 
counterparts. We examined the pattern of gene expression in these phenotypically 
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characterized cells with respect to the weighted discriminator list (Fig. 2b) that defines the 
major cluster of 19 cutaneous melanomas. Strikingly, genes expressed in common in the 
highly invasive uveal melanoma cells (Fig. 2b, inset) were strongly anti-correlated with the 
same gene from the major cluster of cutaneous melanoma samples (Fig. 2b). This 
5 observation, coupled with the known biological function of genes within the weighted list, 
indicated that specimens assigned within the major cutaneous melanoma cluster (Fig. la, b) 
would have reduced motility and reduced invasive ability as they have down-regulation of 
genes related to cell spreading or migration, including formation of focal adhesions (Adams 
"= "Characterization of cell-matrix adhesion requirements for the formation of fascin 
iO microspikes" Mol. Biol. Cell 8:2345-2363, 1997; Scott et al. "ppl25FAK in human 

melanocytes and melanoma: expression and phosphorylation" Exp. Cell Res. 219: 197-203, 
1995; each of which is incorporated herein by reference). Specific genes with reduced 
expression in the major cluster included integrin fil (Jannji et al. "Autocrine TGF-beta- 
regulated expression of adhesion receptors and integrin-linked kinase in HT- 144 melanoma 
1 5 cells correlates with their metastic phenotype" Int. J. Cancer 83 :255-262, 1 999; Hieken et al. 
"Betal integrin expression in malignant melanoma predicts occult lymph node metastases" 
Surgery 1 18:669-673, 1995; each of which is incorporated herein by reference), integrin B3 
(Van Belle et al. "Progression-related expression of beta3 integrin in melanomas and nevi" 
Hum. Pathol. 30:562-567, 1999; incorporated herein by reference), integrin al (Hieken et al. 
20 "Betal integrin expression in malignant melanoma predicts occult lymph node metastases" 
Surgery 118:669-673, 1995; incorporated herein by reference), syndecan 4 (Woods et al. 
"Syndecan-4 binding to the high affinity heparin-binding domain of fibronectin drives focal 
adhesion formation in fibroblasts" Arch. Biochem. Biophys. 374:66-72, 2000; incorporated 
herein by reference) and vinculin (Helige et al. "Interrelation of motility, cytoskeltal 
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organization and gap junctional communication with invasiveness of melanocyte cells in 
vitro" Invasion Metastasis 17:26-41, 1997; incorporated herein by reference) (Figs 2 and 3; 
see Supplementary Information). In samples outside the major cluster increased expression 
of fibronectin is particularly interesting. With other reports (Maung et al. "Requirement for 
focal adhesion kinase in tumor cell adhesion" Oncogene 18:6824-6828, 1999; Silletti et al. 
"Autocrine motility factor and the extracellular matrix I. Coordinate regulation of melanome 
cell adhesion, spreading and migration involves focal contact reorganization" Int. J. Cancer 
76:120-128, 1998; each of which is incorporated herein by reference), this observation 
indicates that these cells are induced to secrete this pro-migratory molecule, consistent with 
an important role for focal contacts in modulating melanoma cell motility. 

We then directly tested the prediction from the array results that cell spreading and 
migration could be discordant between melanoma cluster groups. Cutaneous melanomas 
(assigned either in or out of the major cluster) were characterized using a series of cellular 
assays applied to test cell motility and invasiveness (Table 1, Fig. 4). Figure 4 illustrates the 
discordance of cutaneous melanoma samples within the major cluster and those outside this 
group. As predicted from the analysis of their gene expression patterns, melanomas within 
the major cluster had reduced motility (P = 0.0063), invasive ability (P = 0.0055) and 
vasculogenic mimicry in comparison with melanomas outside the major cluster (Table 1). 

The patient population in this study had a uniformly poor prognosis, and neither 
typical clinical factors (for example, age, sex, biopsy site) nor in vitro characteristics (for 
example, passage number) provide strong correlation with clinical outcome, or expression 
information (see Supplementary Information). In contrast, molecular classification of these 
tumors on the basis of gene expression (Fig. 1, Table 1) could identify a previously 
undetected subtype of this cancer. The analyses described here were not designed to address 
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the relationship of gene expression profile and clinical outcome in melanoma patients, and 
thus the clinical relevance of our observed subgrouping awaits further analysis. However, 
survival information was available on 15 patients, and the results, though not statistically 
significant, are of interest. Three deaths occurred out of 10 patients in the tight cluster of 19 
5 while 4 deaths occurred out of 5 patients in the remaining group (log-rank P- value = 0.135). 
Our results indicate melanoma will provide a unique opportunity to study a homogeneous 
group of patients to determine if gene expression patterns predict prognosis or therapeutic 
iz response in settings where we cannot currently determine who is most at risk for rapid 

disease progression and death. 
J 0 Finally, classification of melanoma on the basis of gene expression patterns is 

" possible, despite the prevailing view that the 'taxonomy' of this disease falls in a continuous 
" spectrum lacking discernible entities. Our data show that melanoma is a useful model to 
r identify genes critical for aspects of the metastatic process, including tumour cell motility and 
= the ability to form primitive tubular networks that may contribute to tumour perfusion. The 
1 5 extent to which melanoma samples can be clinically subdivided by expression patterns 

remains to be elucidated. However, our identification of genes 'weighted' for their ability to 
discriminate a subset of melanomas should provide a sound molecular basis for the dissection 
of other clinically relevant subsets of this tumur. 

20 Methods 
Samples 

Cultured cells were collected and mRNA isolated as described (Khan et al. "DNA Microarray 
technology: the anticipated impact on the study of human disease" Biochim. Biophys. Acta 
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1423:17-28, 1999; www.nhgri.nih.gov//DIR/microarray; each of which is incorporated herein 
by reference). Samples underwent a series of controls for quality of mRNA, labeling and 
hybridization, as well as sample integrity (including genotyping DNA from all samples with 
five dinucleotide markers from four different chromosomes to insure individuality). The 
entire coding sequence of the pi 6 gene and exon 3 of the fi-catenin genes was sequenced to 
assess the mutation status of all available samples (see Supplementary Information). The 
biopsy tumour specimens used in this study were obtained with Institutional Review Board 
approval and clinical information is provided in the Supplementary Information. Biopsies 
were debrided, dissected into small pieces and frozen in liquid nitrogen. Frozen specimens 
were immediately placed into TRIzol Reagent (Gibco BRL), homogenized and mRNA 
isolated as described (Khan et al "DNA Microarray Technology: The Anticipated Impact on 
the Study of Human Disease" Biochim. Biophys. Acta 1423:17-28, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). 

Micro arrays 

The 8,150 human cDNAs used in this study were obtained under a Cooperative Research and 
Development Agreement with Research Genetics and 6,912 were verified by sequence. This 
set of cDNAs is part of a larger collection (Khan et al. "Gene expression profiling of alveolar 
rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1998; Duggan et 
al. "Expression profiling using cDNA microarrays" Nature Genet. 21:10-14, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). On 
the basis of the Unigene build of 9 March 2000 

(http://www.ncbi.nlm.nih.gov/UniGene/build.html), the 8,150 cDNAs represent 6,971 unique 
genes in this melanoma array. All clones were confirmed by resequencing if necessary. 
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Microarrays were hybridized, scanned and image analysis performed as described (Khan et 
al. "Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays" 
Cancer Res. 58:5009-5013, 1998; Khan et al. "DNA Microarray technology: the anticipated 
impact on the study of human disease" Biochim. Biophys. Acta 1423:17-28, 1999; 
www.nhgri.nih.gov/DIR/microarray; each of which is incorporated herein by reference). The 
raw data from the microarray is shown in Appendix A, a Microsoft Excel Worksheet, which 
has been included on a CD-ROM submitted with this application and is incorporated herein 
by reference. 

Statistical methods 

Detailed information on all statistical methods is in the Supplementary Information. 
Agglomerative hierarchical clustering of the 31 melanomas on the basis of their gene 
expression profiles was performed as described (Khan et al. "Gene expression profiling of 
alveolar rhabdomyosarcoma with cDNA microarrays" Cancer Res. 58:5009-5013, 1998; 
Bittner et al. "Data analysis and integration of steps and arrows" Nature Genet. 22:213-215, 
1999; each of which is incorporated herein by reference), to investigate relationships between 
tumour samples. Average linkage was used, as well as a dissimilarity measure of one minus 
the Pearson correlation coefficient of log ratios. The cutoff employed to obtain the observed 
partitioning was 0.54. The MDS was performed using an implementation of MDS in the 
MATLAB package. A non-hierarchical clustering algorithm (Ben-Dor et al. "Clustering 
gene expression patterns" J. Comput. Biol. 6:281-297, 1999; incorporated herein by 
reference) was used to define experimental clusters. This approach takes a graph theoretic 
approach, and makes no assumptions on the similarity function or the number of clusters 
sought. 
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To generate the weighted gene list, cluster compaction and separation were evaluated. 
For a given clustering result, n\ = 19 and n 2 = 12, the discriminative weight of each gene w = 
d&/(kid w ] + k.2d W 2 + ct); where d& is the centre-to-centre distance (between cluster Euclidean 
distance), d w j is the average Euclidean distance among all sample pairs within cluster /, k, = 
5 tj/(ti + ti) for a total of t; sample pairs in cluster /, and a is a small constant (0. 1 in our study) 
to prevent the zero denominator case (Fig. 2a). Genes may then be ranked on the basis of w. 

% In vitro biological assays 

ry Floating collagen lattices were prepared and used to test selected cell lines for their 

4 J 0 ability to deform the gels as described (Maniotis et al. "Vascular channel formation by 
W j 1 human melanoma cells in vivo and in vitro: vasculogenic mimicry" Am. J. Pathol. 155:739- 
752, 1999; Table 1 legend). Samples were also tested for their ability to migrate into an in 
vitro scratch wound as described (Tamura et al. "Inhibition of cell migration, spreading and 
u focal adhesions by tumor suppressor PTEN" Science 280:1614-1617, 1998; incorporated 
1 5 herein by reference). Cells were stained with Giemsa, a digital micrograph of the region was 
prepared and the stained area as a percent of total area in the scraped and open sub-regions 
was estimated by a thresholding procedure using IPLabs Spectrum (Scanalytics, Vienna, 
Virginia) software. Results in Table 1 represent data from 24 h after plating on coverslips 
treated with fibronectin (FN; 10 (xg ml" 1 ; Tamura et al. "Inhibition of cell migration, 
20 spreading and focal adhesions by tumor suppressor PTEN" Science 280:1614-1617, 1998; 
incorporated herein by reference). 

Examples of tubular network formation (associated with vasculogenic mimicry) could 
be observed following seeding of cell lines onto three-dimensional gels of polymerized 
Matrigel or Type 1 collagen (Collaborative Biochemical) as described (Maniotis et al. 
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"Vascular channel formation by human melanoma cells in vivo and in vitro: vasculogenic 
mimicry" Am. J. Pathol. 155:739-752, 1999; Table 1). 

Table 1 lists results from high throughput screening for cell migration as the radial 
dispersion of cells from an initial confluent monolayer of 2,000 melanoma cells deposited 

5 within a 1 .0 mm circular area on glass surfaces precoated with FN (1 00 ug ml" 1 ; Berens et al. 
"The role of extracellular matrix in human astrocytoma migration and proliferation studied in 
a microliter scale assay" Clin. Exp. Metastasis 12:405-415, 1994; Giese et al. "Contrasting 
migratory response of astrocytoma cells to tenascin mediated by different integrins" J. Cell 
Sci. 109:2161-2168, 1996; each of which is incorporated herein by reference). 

10 Selected cell lines were tested for their ability to invade a defined basement 

membrane matrix. Tumor cells (1 x 10 s ) were seeded into the upper wells of the membrane 
invasion culture system (MICS) chamber (Hendrix et al. "A simple quantiative assay for 
studing the invasive potential of high and low human metastatic variants" Cancer Lett. 

-= 38:137-147, 1987; incorporated herein by reference) onto collagen/laminin/gelatin-coated 

1 5 (Sigma) polycarbonate membranes containing 1 0-nm pores (Osmonics, Livermore, 
California) containing lx Mito+ Serum Extender (Becton Dickinson). After 24 h of 
incubation at 37°C, the cells that invaded each membrane were collected, stained and counted 
as described (Hendrix et al. "Role of intermediate filaments in migration, invasion and 
metastasis" Cancer Metastasis Rev. 15:507-525, 1996; incorporated herein by reference). 

20 Percent invasion was corrected for proliferation and calculated as (total number of invading 
cells/ total number of cells seeded) x 100. 

Supplement I - Statistical Methods for Clustering of Gene Expression Data and 
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Validation of Cluster Predictions 



OVERVIEW: 

To fully appreciate the expression patterns derived from large number of cDNA microarrays 
5 and their relationship between melanoma tumor samples, several statistical methods were 
integrated as follows, 

a. Multidimensional scaling (MDS) method was employed in order to visualize the 
= ~ similarity between samples, and a hierarchical clustering dendrogram was produced by an 

implementation of the average-linkage clustering algorithm, 
10 b. The clustering results were further verified by a non-hierarchical algorithm, CAST (Ben- 
Dor et al. J. Comput. Biol. 6:281-297, 1999; incorporated herein by reference), 

c. In order to determine the tightness and the statistical significance of the clusters derived 
from various methods, two independent approaches were assembled to validate the 

= prediction. One, WADP/t method, is sensitivity analysis of the noise perturbation to the 

1 5 data set. The other one is based on comparing the discrimination power observed for 

genes in the data to that expected in random data. This is accomplished using TNoM 

scoring. 

d. After confirming the clustering result, each gene was weighted based on their 
discriminative ability for the clusters derived from previous method. 

20 

In the following section, detailed descriptions of the methods listed in Steps 3 to 4 will be 
presented. For some of the more standard methods, such as MDS, average-linkage methods, 
and CAST, we refer readers to the literature (Ben-Dor et al. J. Comput. Biol. 6:281-297, 
1999; Eisen et al. Proc. Natl Acad. Sci. USA 95:14863-14868, 1998; Everitt Cluster Analysis 
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(London: Edward Arnold), 1993; each of which is incorporated herein by reference). Since 
not all genes were readily detectable by the array method, a subset of the total number of 
surveyed genes was analyzed in all cases. A set of 3613 genes was chosen for analysis. The 
genes were chosen by an empirically derived set of criteria requiring an average mean 
5 intensity above background of the least intense signal (Cy3 or Cy5) across all experiments 
>2000 arbitrary units, and an average spot size across all experiments of >30 pixels. To 
avoid distortions of the data resulting from ratios where the signal in one channel is large, and 
: the signal in the other channel is undetectable, ratios higher than 50 or lower than 0.02 were 
truncated to 50 or 0.02 for these analyses. 

10 

Description of the WADP^ method for testing the validity of cluster predictions 

: - Hierarchical clustering of the 3 1 melanoma samples was performed, resulting in a 

dendrogram (Fig. lb). Although the dendrogram gives insights about the similarity and 
relatedness among samples, it does not indicate robustness to variability associated with the 

1 5 assay sampling, etc. In order to draw valid conclusions about the clustering structure present 
in the data, it is necessary to investigate how variability affects the results of the cluster 
analysis. To this end, we developed and implemented a method that determines the 
reproducibility of given levels of cluster structure within the dendrogram under the condition 
of added noise. The method is described below. 

20 First, cut the original dendrogram at a height that results in k clusters and let A* denote 

the number of clusters containing 2 or more elements. Let M, represent the number of pairs of 
elements in the i th of the Na clusters. Next, perturb the data by adding to every log-ratio of 
each sample an independent random deviate generated from the N(0,O) distribution. Cluster 
the perturbed data and cut the resulting dendrogram at a height that again results in k clusters. 
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For the M x pairs of elements in the i th original cluster, record the number of those pairs, D, 
that do not remain together in the clustering of the perturbed data. Next, calculate the overall 
discrepancy rate for the clustering: {D x + D 2 + ... + D Nk )!{M X + M 2 + . . • + M Nk ). This 
overall discrepancy rate is a weighted average of the N k cluster-specific discrepancy rates 

5 (/. e., D l lM l , for i = 1 , 2, . . . , Nk), with weights proportional to the number of pairs in individual 
clusters. Finally, repeat the calculations over many perturbations of the original data set and 
report the average overall discrepancy rate (termed the Weighted Average Discrepant Pairs 
for k clusters, or WADP*). The above procedure is repeated for all possible cuts of the 
original dendrogram and WADP^ is plotted versus k. Minima of the WADP curve are 

10 interpreted as indicating reproducible levels of structure. 

The parameter q represents the noise standard deviation inherent to the system. As 
mentioned above, the noise is composed of— at the least — assay variability and sampling 
variability, o is unknown and must be estimated. The method we use for estimating a is to 
compute the variance of the log-ratio of each gene across all samples. We then use the 

1 5 median of the empirical distribution of these variances as an estimate of <? It may be more 
appropriate to use a smaller value (say the tenth percentile of the empirical distribution), if it 
were believed that a large percentage of genes present on the array were truly differentially 
expressed within the population of samples hybridized. 

20 Description of the TNoM method for the cluster significance based on random partition. 

Threshold number of misclassification, or TNoM score, is a simple threshold-based 
method that uses a given expression level, for a given gene, to predict the cluster label of a 
given test sample. In the present study, we have 3 1 samples form 2 groups. Therefore, we can 
label the samples by /, , / = 1 , . . ., m, where /, e {0,1 } and m = 31 . For the Mi gene, let <x„ /,>* 
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be its expression pattern (or ratios in this study) and corresponding cluster labels. A threshold 
function is defined as, 

f a, ifx < h 
fh,a( x ) [i - a otherwise 

where h is a threshold value, and a e {0,1 }. For a given h and a we can assign the label 
5 fhA x d to the ith sample. The number of misclassifications entailed by this scheme is, 

£ The TNoM score for the Mi gene, s k , is defined as the minimum error achieved over all 
^L' possible choices of h and a, 

J M -^ = iTiin|z^ --.//,.<«(** j|j 

i 0 The minimization step is accomplished by exhaustively searching all 2(m+l) possibilities. 

To examine the significance of groups derived by clustering algorithm, we used three 
steps. First, we evaluated TNoM scores for all genes found in the data set. Then, the number 
of genes that have TNoM score less than or equal to s, for s = 0, . . ., 12 (where 12 is the 

1 5 maximum misclassifications any classification rule may commit) was listed. Next, we 
randomly assigned cluster labels to all samples to form two arbitrary groups of 19 and 12 
samples. The TNoM score was again evaluated for each gene. A list of the number of genes 
that have TNoM score less than or equal s was similarly obtained. We repeated this process 
50 times to observe random fluctuations and their range of scores. Finally, the expected 

20 number of genes resulting in s or fewer misclassifications under the assumption of perfect 
random gene expression patterns can be calculated (Ben-Dor et al, submitted for 
publication). As expected, the value produced by the 50 random sampling is close to those 
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produced by the theoretical rigorous calculation. The significance of the suggested clusters is 
reflected in the overabundance of genes with low TNoM scores. More precisely, a 
meaningful partition will produce far more genes with low TNoM scores than a random one. 

5 Description of the weighting method based on gene's discriminative ability. 

The clustering algorithms described in the text produced one tightly bonded cluster of 
hi = 19 samples, and we assume the rest of n 2 = 12 samples form another cluster. For a given 
two-cluster setting, a discriminative weight for each gene can be evaluated by, 
w = d B l (hd Wl + k 2 d W2 + a) 

It) where d B is the center-to-center distance (between cluster Euclidean distance), d Wi is the 

average Euclidean distance among all sample pairs, total of fi and t 2 sample pairs for cluster 1 

: ~ and 2, respectively, and k\ = hl(h+t 2 ), and k 2 = t 2 l {t\+h). a is a small constant (0.1 in our 

; study) to prevent zero denominator case. Genes may then be ranked on the basis of w. The 
equation for weight w is not only designed to evaluate discriminative ability for single gene, 

1 5 but also capable of evaluate discriminative ability for 2 or more genes together. If you do not 
assume the second group of samples to be a tight cluster you can drop the d W2 term. 

Supplement II - Statistical Analysis of Clinical and Culture Characteristics of 
20 Melanoma Clusters 

SUMMARY REPORT; 

Thirty-one tissue specimens were clustered using the Bioclust clustering algorithm 
(see text), resulting in one tight cluster of 19 specimens (Group A) and 12 specimens that 
showed no specific clustering pattern (Group B). Statistical tests were performed to 
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determine whether any clinical or tumor cell characteristics were specifically associated with 
cluster group. For categorical variables we created a contingency table and used Fisher's 
exact test to compute a p-value (the Chi-square test was not used because each table had at 
least one expected cell frequency less than 5). For continuous and ordered variables, we used 
the Wilcoxon two-sample (rank-sum) test, a non-parametric alternative to the two-sample t 
test. Tests were performed in S-plus 4.5 and StatXact 3.1. 



The two groups consisted of the following patient IDs: 



Group A 


Group B 


M93-007 M91-054 UACC091 UACC502 
UACC1256 UACC127 UACC253 M92-001 
UACC457 UACC383 UACC309 A-375 
UACC1022 TD1376-3 TD1683 TD1720 
TD1384 TD1730 TC1376-3 


HA-A UACC827 UACC1529 
UACC647 UACC930 M93-047 
UACC2837 TC-F027 WM1791C 
UACC1012 UACC1097 UACC903 



As noted in the text, two pairs of specimens in Group A were derived from the same 
patient. The two pairs are M93-007 & M92-001 and TD1376-3 & TC1376-3. In our 
analyses, we only considered the data for each of these patients once or, as specifically noted, 
entirely removed the specimens for these patients from the analysis. 

We first performed an analysis that included all specimen types (tissues and cell 
lines). We tested for associations between group and the following variables: sex, age, 
mutation status, biopsy site*, pigment, Breslow thickness, Clark level, and specimen type. 
There was no variable tested, which was shown to be associated with cluster group (at the 
0.05 significance level. 

Although there was not a statistically significant association between group and 
specimen type (p=0.106) it was noteworthy that all 5 tissue specimens were located in Group 

* Biopsy site was broken down into the following three categories: skin/external (including ankle, 
abdomen/chest, shoulder, breast, neck/forehead and back), internal (including chest wall, distal ileum, 
paraspinous, thyroid lobe, small bowel, rectus muscle and intra-abdominal), and lymph nodes (including 
axillary, cervical and thigh femoral). 
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A. We therefore performed another analysis in which we only considered data from ceil 
lines. In the analysis of cell lines, no variables were associated with cluster group at the 0.05 
significance level, although "age" did have a marginal association (/?=0.0812). Passage 
number was also tested in this analysis and had no association with group (p=0.8570). 

5 Next, we investigated for differences in survival between the two cluster groups. We 

used a measure of survival that indicated survival time from the date of biopsy. Four cases 
(including the previous two) had a biopsy date falling in 1998 and a known status (alive or 
dead) for which a specific date of death or last follow-up was unknown. In order to use these 
cases in the survival analysis, the survival/follow-up time in these cases was arbitrarily set to 

10 1 year if the biopsy date occurred prior to 7/1/98 or 0.5 years if the biopsy date occurred on or 
after 7/1/98. 

The data used in the survival analysis are shown in Figure 1. A total of 15 cases were 
included in the analysis, 10 from Group A and 5 from Group B. Survival/follow-up times 
were rounded to the nearest quarter year. A Kaplan-Meier survival plot was created and log- 
1 5 rank test performed. No statistically significant association between group and survival was 
found (p=0. 135). 

The analyses performed resulted in no significant association with cluster group. 
However, this does not necessarily mean associations do not exist between the groups and the 
clinical and tumor characteristics tested. The power of the tests we performed is limited by 
20 the amount of data available for each variable. For example, only 6 specimens in Group A 
and 3 in Group B have information on Breslow thickness. Finding significant associations 
with so few data is unlikely. The power of the tests would increase with more complete data 
on the existing specimens and by the addition of new specimens to the data set. Such studies 
are underway in our laboratory. 
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ANALYSIS OF ALL SPECIMENS: 

Group A = specimens that cluster; Group B = others. 

Two pairs of specimens in Group A (M93-007/M92-001 & TD1376-3/TC1 376-3) were 
derived from the same patient. The clinical and tumor characteristics for each of these 
5 patients are only considered once in the below analyses. 

SEX- no statistically significant association with group 

Contingency table with Fisher's exact test 
10 A B 

F 4 4 p-value = 0.6754 

M 12 7 alternative hypothesis: two-sided 

1 5 AGE - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.1397 
data: x: age w/group = A , and y: age w/group = B 
Mann- Whitney Statistic: W = 102.0, n=15, m=10 
20 alternative hypothesis: two-sided 

MUTATION STATUS - no statistically significant association with group 
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Contingency table with Fisher's exact test 
A B 

mutated 2 4 p-value = 0.1713 
deleted 6 1 alternative hypothesis: two-sided 
WT 4 2 

Contingency table with Fisher's exact test 
Combined mutated and deleted into one category. 
A B 

mut./del. 8 5 p-value = 1 

WT 4 2 alternative hypothesis: two-sided 



15 



BIOPSY SITE - no statistically significant association with group 



Contingency table with Fisher's exact test 
A B 

skin/external 3 3 p-value = 0.8763 

internal 4 3 alt. hypothesis: two-sided 

20 LN 7 4 



PIGMENT - no statistically significant association with group 
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Wilcoxon rank-sum test: p-value = 0.2631 

Pigment Type: light=l, med=2, dark=3 

(amelanotic = light; tan = med; pigmented = dark.) 

data: x: pig. type w/ group = A , and y: pig. type w/group = B 

Mann- Whitney Statistic: W = 76.5, n=13, m=9 

alternative hypothesis: two-sided 

BRESLOW THICKNESS - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.2619 

data: x: thickness w/group = A , and y: thickness w/group = B 

Mann- Whitney Statistic: W = 14.0, n=6, m=3 

alternative hypothesis: two-sided 

CLARK LEVEL - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.4481 
Clark level: 11=2, 111=3, IV=4 

data: x: Clark level w/group = A , and y: Clark level w/group = B 
Mann- Whitney Statistic: W = 19.5, n=6, m=5 
alternative hypothesis: two-sided 
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For the below analysis, the two pairs of specimens in Group A derived from the same patient 
(M93-007/M92-001 & TD1376-3/TC1 376-3) were removed. 

5 

SPECIMEN TYPE - no statistically significant association with group 

Contingency table with Fisher's exact test 
A B 

K) cell line 11 12 p-value = 0.106 

tissue 4 0 alternative hypothesis: two-sided 
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ANALYSIS OF CELL CULTURES: 



Group A = specimens that cluster; Group B = others. 
5 A pair of cell lines in Group A (M93-007/M92-001) was derived from the same patient. The 
clinical and tumor characteristic for this patient is only considered once in the below 
analyses. 

SEX- no statistically significant association with group 



Contingency table with Fisher's exact test 



A 



B 



F 



4 



4 



p-value = 1 



M 



8 



7 



alternative hypothesis: two-sided 



15 



AGE - no statistically significant association with group 



Wilcoxon rank-sum test: p-value = 0.0812 



20 



data: x: age w/group = A , and y: age w/group = B 



Mann- Whitney Statistic: W = 80.0, n=l 1, m=10 



alternative hypothesis: two-sided 
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MUTATION STATUS - no statistically significant association with group 



Contingency table with Fisher's exact test 





A 


B 




mutated 


2 


4 


p-value = 0.1713 


deleted 


6 


1 


alternative hypothesis: two-sided 


WT 


4 


2 





i; Contingen cy table with Fisher's exact test 

Combined mutated and deleted into one category. 
Tl A B 

:!-' mut./del. 8 5 p-value = 1 

fi WT 4 2 alternative hypothesis: two-sided 



15 

BIOPSY SITE - no statistically significant association with group 



Contingency table with Fisher's exact test 
A B 

skin/external 2 3 p-value = 0.7272 

internal 2 3 alt. hypothesis: two-sided 

LN 6 4 
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PIGMENT -no statistically significant association with group 



Wilcoxon rank-sum test: p-value = 0.4212 

Pigment Type: light=l, med=2, dark=3 

amelanotic = light; tan = med; pigmented = dark. 

data: x: pig. type w/group = A , and y: pig. type w/group = B 

Mann- Whitney Statistic: W = 50.5, n=9, m=9 

alternative hypothesis: two-sided 



BRESLOW THICKNESS - no statistically significant association with group 

Wilcoxon rank-sum test : p-value = 0.2000 

data: x: thickness w/group = A , and y: thickness w/group = B 

Mann- Whitney Statistic: W = 8.0, n=3, m=3 

alternative hypothesis: two-sided 



CLARK LEVEL - no statistically significant association with group 
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Wilcoxon rank-sum test: p-value = 0.6349 
Clark level: 11=2, 111=3, IV=4 

data: x: Clark level w/group = A , and y: Clark level w/group = B 
Mann- Whitney Statistic: W = 13.0, n=4, m=5 
5 alternative hypothesis: two-sided 



For the below analysis, the pair of specimens derived from the same patient in Group A 
~ (M93-007/M92-001) was removed. 

ro 

PASSAGE NUMBER - no statistically significant association with group 

Wilcoxon rank-sum test: p-value = 0.8570 
Passage # 's for established cell lines were set equal to 21. 
1 5 data: x: passage # w/group = A , and y: passage # w/group = B 

Mann- Whitney Statistic: W = 34.0, n=8, m=8 
alternative hypothesis: two-sided 

Contingency table with Fisher's exact test 





A 


B 




1-5 


3 


4 


p-value = 0.8695 


6-10 


4 


2 


alternative hypothesis: 


11-20 


4 


5 




>20 


1 


1 
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SURVIVAL ANALYSIS: 

Data used in the survival analysis: 





Pt.ID 


Group 


Status 






Myj-uu / 




o 


7 




M91-054 


A 


o 


7 




T TAPPDQ1 

V J / Y v_ v_ U V 1 


A 


o 


7 




UACC502 


A 


1 


0.5 




UACC2534 


A 


1 


0.25 




TD1683 


A 


1 


1 




TD1720 


A 


0 


0.5 




TD1348 


A 


0 


5 




TD1730 


A 


0 


0.5 




TC1376-3 


A 


0 






UACC827 


B 


1 


0.5 




UACC930 


B 


1 


2.25 




M93-047 


B 


0 


6 




TC-F027 


B 


1 


1 




UACC903 


B 


1 


0.25 



5 



Status: 0 = alive, 1 = dead 
Time is in years. 



1 0 Example 2-Expression of Wnt5a in Cell Lines with Originally Low Level Expression 

Wnt5a scored very high out of all the marker genes analyzed in the ability to 
discriminate between highly invasive malignant melanoma and less invasive melanoma. 
Melanoma samples with high levels of WntSa expression were more aggressive tumors than 
those with lower levels of Wnt5a expression. Figure 6 shows the top 22 genes selected for 
15 their ability to classify highly invasive malignant melanoma from less invasive melanoma. 
Wnt5a is at the tope of the list of these marker genes. 
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Figure 6 also shows WntSa" s expected signaling pathway in contrast to the Wntl 
pathway. Wntl is known to be transforming; however, its proximal methods of signaling are 
very difference from those of WntSa. In some studies, researchers have observed that the two 
pathways seem to oppose each other in terms of downstream effects. In the Wnt5a pathway, 

5 the first transduction of the WntSa signal is accomplished through the interaction of Wnt5a 
with a G protein-coupled receptor, frizzled 5 (FZD5). The signal is subsequently transduced 
through the PLC/IP3/DAG/PKC pathways. The WntSa signal eventually leads to integrin 
interactions, cytoskeletal effects, and other cellular effects. 

Low level expression of Wnt5a in the cluster of 1 9 melanomas was verified by real 

tO time PCR. Data for the samples WM-1791C and UACC-1273 are shown in Figure 7. The 
real time PCR results show that there is much more WntSa transcript in cell line WM-1 791 C, 
which originally was scored as having high level expression of Wnt5a by gene chip analysis, 
than in UACC-1273, which was originally scored as having low level expression. Vectors 
used to express higher levels of Wnt5a in cells that normally express low levels were 

1 5 developed using standard techniques to see if the phenotype of less aggressive samples 
expressing low levels of Wnt5a could be changed. A derivative of UACC-1273, a 
transfectant 4-3, which had been transfected with this vector, shows an intermediate level of 
WntSa expression in the real time PCR analysis. The increase in WntSa expression carries 
over in WNT5A protein abundance as shown by Western blot and by immunohistochemical 

20 staining (nuclei staining blue, WNT5A staining red) (Figure 7). 

In terms of morphology, cell lines with originally low levels of WntSa expression 
showed dramatic changes in morphology and cytoskeletal organization when stably 
transfected with a vector driving WntSa expression. The parental line, UACC-1273, is 
spindle shaped with few points of attachment to the culture plate and disorganized actin 
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filaments (Figure 8). The transfectants are broader and flatter with many extensions and 

highly polarized actin filaments. 

In order to determine whether there was cross talk between the Wnt5a and Wntl 

pathways, an assay looking at beta-catenin was used. When Wntl signaling is active, beta- 
5 catenin is localized to the nucleus. In Figure 9, antibody staining for beta-catenin shows that 

the beta-catenin is localized in the cytoplasm and not concentrated in the nucleus. Therefore, 

no cross talk between the two pathways seems to be occurring. 
; Protein kinase C (PKC), a downstream target likely to be modulated by Wnt5a, was 

also looked at. WntSa modulates PKC activity by phosphorylation of some or all of the PKC 
10 isoforms and not by alteration of PKC transcript levels. As can be seen in Figure 9, increased 

phosphorylated PKC is produced in the transfectants expressing significant levels of the 
"- Wnt5a transcript, as expected. The isoforms must frequently phosphorylated are mu and 
alpha/beta. This is further evidence that one is looking at the exptected WntSa pathway. 
1 PKC is one of the central hubs of signal transduction, and pathways leading to many types of 
1 5 cellular action incuding proliferation, cytoskeletal organization, and cell movement are 
known. 

Increased cell movement and invasiveness were also found to correlate with increased 
WntSa expression in a scratch assay and a Boyden chamber assay. Transfectants expressing 
increased levels of Wnt5a show increased competence in filling in open gaps on a cell culture 
20 dish when compared to cells of the parent cell line (Figure 1 0). Increased phosphorylated 
PKC was found to correlate with increasing cell invasiveness as measured by a standard test 
for invasiveness, the Boyden chamber assay. 

The first transduction of the Wnt5a signal is accomplished through interaction with a 
G protein coupled, seven transmembrane receptor, frizzled 5. The various cell lines tested 
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show varying native levels of fzd5 transcript. In the cell line, UACC-1273, the transition 
from low to high WntSa expression is not associated with increasing amounts of the receptor. 
The use of an antibody to fzd5 prevents it from responding to WntSa and thereby attenuates 
or reverses the phenotypes that increased Wnt5a would normally produce. This is shown in 
the decreased level of phosphorylated PKC upon treatment with the anti-fzd antibody and in 
the decreased invasiveness of Wnt5a transfectants treated with the ant-fzd antibody. 

Other Embodiments 

The foregoing has been a description of certain non-limiting preferred embodiments 
of the invention. Those of ordinary skill in the art will appreciate that various changes and 
modifications to this description may be made without departing from the spirit or scope of 
the present invention, as defined in the following claims. 
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